On 2020-01-14 I run the analysis of Gene Ontology terms. Here I want to focus on part of those results and make some findings clearer. I will use results only from gene annotation, and not from annotation of transcripts, which is a somewhat noiser dataset. I will not look at the functions affected by hatching condition, since it only affects a couple of genes, to be inspected later. Among all cellular functions significantly enriched with genes differentially expressed among selective regimes, I will focus on those that are most significant, and detailed enough. For example, I skip terms like transmembrane transport, proteolysis or protein phosphorylation, because they are too general. Sections below are named for the functional categories that I think are worth discussing. I will try to contextualize the results. My entry point to the literature is (Garcia-Roger et al. 2019).
I need the results of the differential expression analysis (2020-01-08) to identify the genes annotated with the significant GO terms, and to assess the direction of gene expression change between selective regimes.
library(variancePartition)
library(GO.db)
library(ggplot2)
library(tidyr)
library(reactable)
ANNOTATION <- '../2019-07-26/genes/annotation.txt'
EXPRESSION <- '../2020-01-08/genes.RData' # variance partition and the mixed models fitted with dream()
The annotation table includes only the most specific terms that can be assigned to a gene. The more general, ancestor terms are assumed, of course, but not explicitly mentioned in the annotation. For the purpose of retrieving all genes annotated to a term, I need to expand the annotation, to make ancestor terms explicit. It took me a while to realize that I can do this with the unlist() function:
annotation <- read.table(ANNOTATION, col.names = c('gene', 'GOterms'), colClasses = c('character', 'character'))
head(annotation)
## gene GOterms
## 1 XLOC_000002 GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009 GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021 GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036 GO:0005272|GO:0006814|GO:0016020
annotation.list <- strsplit(annotation$GOterms, split='|', fixed=TRUE)
names(annotation.list) <- annotation$gene
# append can only join two vectors at a time. It works with lists!
allAncestors <- append(as.list(GOBPANCESTOR),
append(as.list(GOMFANCESTOR),
as.list(GOCCANCESTOR)))
fullAnnotation <- lapply(annotation.list,
FUN = function(x){
unique(append(x,
unlist(allAncestors[x], use.names=FALSE)))
})
load(EXPRESSION)
Without any attempt to characterize gene function any further, I will make the assumption that the more genes in a pathway are expressed, the more active the pathway is. It is, however, conceivable that some genes are annotated as involved in a function that they regulate negatively.
Dormant eggs are embryos. RNA was extracted either after forced diapause or not, but under hatching conditions. That is, a variable portion of eggs are expected to be leaving the dormancy stage and resuming metabolism, development and hatching. The portion of eggs getting ready to hatch is known to be larger in populations in regular environments.
It is among the most significantly enriched terms, with only 13 genes annotated.
goterm <- 'GO:0006289'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
# Not all genes annotated have been used in expression analysis
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
The expression of NER genes is higher in dormant embryos from populations evolved in the unpredictable environment. No idea why.
goterm <- 'GO:0007186'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
Among the 206 genes in this category, 55 have an adjusted p-value lower or equal to 0.1. There are too many genes annotated with this function to plot all their stratified expression levels. I select the most differentially expressed ones.
z2 <- z2[z2$adj.P.Val < 0.1,]
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
Among the top 206 genes analysed from the G protein-coupled receptor signaling pathway, 173 have a reduced expression level in the unpredictable environment, and 33 show an increased expression level. Thus, in general we expect dormant eggs from populations under the unpredictable regime to be less sensitive to this particular pathway of signal transduction.
goterm <- 'GO:0007165'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
z0 <- varPart[genes,]
z0 <- z0[order(z0$regime, decreasing=TRUE),]
z0$gene <- factor(row.names(z0), levels=row.names(z0), ordered=TRUE)
z0 <- pivot_longer(z0, cols=1:4, names_to='variable', values_to='variancePart')
z0$variable <- factor(z0$variable, levels=c('regime', 'treatment', 'population', 'Residuals'), ordered=TRUE)
ggplot(data=z0, mapping=aes(x=gene, y=variancePart, fill=variable)) +
geom_bar(stat='identity') + theme(axis.text.x=element_blank()) + ylab('Portion of explained variance') +
ggtitle(Term(GOTERM[goterm]))
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
544 genes are involved in signal transduction. I focus on the 129 most significant ones.
z2 <- z2[z2$adj.P.Val < 0.1,]
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
It seems that in general, signal transduction activity must be lower in resting eggs from populations under the unpredictable regime. This suggests that in unpredictable environments resting eggs rely less in (unreliable) environmental cues, and probably more in endogenous ones.
goterm <- 'GO:0007160'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
Among the top 16 genes involved in cell-matrix adhesion, 14 have a reduced expression level in the unpredictable environment, and 2 show an increased expression level.
Cell-matrix adhesion is likely related to embryonic development.
Several functions related to ion transport are significant. I wonder to what extent their significance is not dependent on a common subset of genes.
goterm_K <- 'GO:0006813'
goterm_COOH <- 'GO:0046942'
goterm_Sulf <- 'GO:0008272'
goterm_Nucl <- 'GO:1901642'
goterm_TM <- 'GO:0034220'
genes_K <- names(fullAnnotation[grep(goterm_K, fullAnnotation, fixed=TRUE)])
genes_COOH <- names(fullAnnotation[grep(goterm_COOH, fullAnnotation, fixed=TRUE)])
genes_Sulf <- names(fullAnnotation[grep(goterm_Sulf, fullAnnotation, fixed=TRUE)])
genes_Nucl <- names(fullAnnotation[grep(goterm_Nucl, fullAnnotation, fixed=TRUE)])
genes_TM <- names(fullAnnotation[grep(goterm_TM, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind(Potassium = row.names(varPart) %in% genes_K,
Carboxilic = row.names(varPart) %in% genes_COOH,
Sulfate = row.names(varPart) %in% genes_Sulf,
Nucleoside = row.names(varPart) %in% genes_Nucl,
Transmembrane = row.names(varPart) %in% genes_TM)))
There is hardly any overlap among genes involved in the transport of these substances. The only remarkable overlap happens between potassium ion transport and ion transmembrane transport. But, still the significance of each term should be quite independent of the significance of any other term. This was expected, because the analysis of functional enrichment applied algorithms to account for the dependence structure among GO terms. I analyse them separately, below.
In summary, most genes involved in ion transport, and also those involved in nucleoside transport, are expressed at lower levels in resting eggs from populations submitted to the random regime.
goterm <- goterm_K
genes <- genes_K # defined above
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
#plotPercentBars(varPart[genes,])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_TM
genes <- genes_TM # defined above.
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.1,]
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) + geom_boxplot() +
facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_Nucl
genes <- genes_Nucl
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) + geom_boxplot() +
facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_COOH
genes <- genes_COOH
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
# I wanted the table to be ordered by FDR, but I need it now ordered by logFC:
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
Trehalose is…
goterm <- 'GO:0005992'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) + geom_boxplot() +
facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm_cellmoti <- 'GO:0048870' # cell motility
goterm_movement <- 'GO:0003341' # cillium movement
goterm_assembly <- 'GO:0060271' # cillium assembly
genes_cellmoti <- names(fullAnnotation[grep(goterm_cellmoti, fullAnnotation, fixed=TRUE)])
genes_movement <- names(fullAnnotation[grep(goterm_movement, fullAnnotation, fixed=TRUE)])
genes_assembly <- names(fullAnnotation[grep(goterm_assembly, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind('Cell motility' = row.names(varPart) %in% genes_cellmoti,
'Cillium movement' = row.names(varPart) %in% genes_movement,
'Cillium assembly' = row.names(varPart) %in% genes_assembly)))
goterm <- goterm_cellmoti
genes <- genes_cellmoti
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_movement
genes <- genes_movement
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_assembly
genes <- genes_assembly
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0043161'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0006979'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])], c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0003774'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
# I limit the number of genes.
z2 <- z2[z2$adj.P.Val <= 0.1,]
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm_metallocarboxy <- 'GO:0004181'
goterm_metalloendopep <- 'GO:0004222'
goterm_serinetypeendo <- 'GO:0004252'
goterm_calciumcystein <- 'GO:0004198'
genes_metallocarboxy <- names(fullAnnotation[grep(goterm_metallocarboxy, fullAnnotation, fixed=TRUE)])
genes_metalloendopep <- names(fullAnnotation[grep(goterm_metalloendopep, fullAnnotation, fixed=TRUE)])
genes_serinetypeendo <- names(fullAnnotation[grep(goterm_serinetypeendo, fullAnnotation, fixed=TRUE)])
genes_calciumcystein <- names(fullAnnotation[grep(goterm_calciumcystein, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind(Metallocarboxy. = row.names(varPart) %in% genes_metallocarboxy,
Metalloendo. = row.names(varPart) %in% genes_metalloendopep,
Serine_type = row.names(varPart) %in% genes_serinetypeendo,
Cystein_type = row.names(varPart) %in% genes_calciumcystein)))
goterm <- 'GO:0004181'
Term(GOTERM[goterm])
## GO:0004181
## "metallocarboxypeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0004252'
Term(GOTERM[goterm])
## GO:0004252
## "serine-type endopeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.1,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0004198'
Term(GOTERM[goterm])
## GO:0004198
## "calcium-dependent cysteine-type endopeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.13,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0004222'
Term(GOTERM[goterm])
## GO:0004222
## "metalloendopeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.11,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0016715'
Term(GOTERM[goterm])
## GO:0016715
## "oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
#z2 <- z2[z2$adj.P.Val <= 0.11,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.7.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.7.0
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=ca_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=ca_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=ca_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] reactable_0.1.0 tidyr_1.0.2 GO.db_3.10.0
## [4] AnnotationDbi_1.48.0 IRanges_2.20.2 S4Vectors_0.24.3
## [7] variancePartition_1.16.1 Biobase_2.46.0 BiocGenerics_0.32.0
## [10] scales_1.1.0 foreach_1.4.7 limma_3.42.0
## [13] ggplot2_3.2.1
##
## loaded via a namespace (and not attached):
## [1] viridisLite_0.3.0 jsonlite_1.6 bit64_0.9-7
## [4] splines_3.6.2 gtools_3.8.1 assertthat_0.2.1
## [7] blob_1.2.1 yaml_2.2.0 progress_1.2.2
## [10] pillar_1.4.3 RSQLite_2.2.0 lattice_0.20-38
## [13] glue_1.3.1 digest_0.6.23 minqa_1.2.4
## [16] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-18
## [19] plyr_1.8.5 reactR_0.4.2 pkgconfig_2.0.3
## [22] purrr_0.3.3 gdata_2.18.0 BiocParallel_1.20.1
## [25] lme4_1.1-21 tibble_2.1.3 farver_2.0.3
## [28] withr_2.1.2 lazyeval_0.2.2 pbkrtest_0.4-7
## [31] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
## [34] evaluate_0.14 doParallel_1.0.15 nlme_3.1-143
## [37] MASS_7.3-51.5 gplots_3.0.1.2 tools_3.6.2
## [40] prettyunits_1.1.1 hms_0.5.3 lifecycle_0.1.0
## [43] stringr_1.4.0 munsell_0.5.0 colorRamps_2.3
## [46] compiler_3.6.2 caTools_1.18.0 rlang_0.4.3
## [49] grid_3.6.2 nloptr_1.2.1 iterators_1.0.12
## [52] htmlwidgets_1.5.1 labeling_0.3 bitops_1.0-6
## [55] rmarkdown_2.1 boot_1.3-24 gtable_0.3.0
## [58] codetools_0.2-16 DBI_1.1.0 reshape2_1.4.3
## [61] R6_2.4.1 knitr_1.27 dplyr_0.8.3
## [64] bit_1.1-15.1 KernSmooth_2.23-16 stringi_1.4.5
## [67] Rcpp_1.0.3 vctrs_0.2.2 tidyselect_1.0.0
## [70] xfun_0.12
Garcia-Roger, Eduardo M., Esther Lubzens, Diego Fontaneto, and Manuel Serra. 2019. “Facing Adversity: Dormant Embryos in Rotifers.” The Biological Bulletin 237 (2): 119–44. https://doi.org/10.1086/705701.